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APPARATUS AND METHOD FOR USING A 
TARGET BASED COMPUTER VISION 
SYSTEM FOR USER INTERACTION 



BACKGROUND OF THE INVENTION 

Field of the Invention 

The present invention generally relates to a method and apparatus to 
recognize actions of a user and to have those actions correspond to specific 
computer functions. 

Description of the Related Art 

Many people find it difficult to use computers because of problems 
operating current input devices such as a keyboard or a mouse. This includes 
users with physical disabilities, young children and people who have never 
learned to use a computer but find themselves conifronted with one at a public 
kiosk or similar workstation. Alternative input devices have been developed, such 
as touch screens and single-switch devices, but these have two distinct 
disadvantages. First, they rely on physical devices that often must be carefully 
setup for the user, and are prone to damage or vandalism in public places. 
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Second, they do not allow the full range of expression needed to effectively 
interact with current computer applications. 

It is therefore desirable to provide a means to allow people to use 
computer systems without the use of physical interface devices. Many methods 
have been explored to allow users to interact Mdth machines by way of gestures 
and movements using cameras. Most of these methods, however, have one or 
more of several limitations. Either they are very limited in the type of gesture they 
can recognize, they require extensive customization for a specific user, they are 
not robust in the face of environmental conditions, they are not reliable or they 
require extensive user training. A robust, flexible and user friendly method and 
apparatus is needed to allow a computer to recognize a wide range of user actions 
using a camera. 

SUMMARY OF THE INVENTION 

In view of the foregoing and other problems of the conventional methods, 
it is, therefore, an object of the present invention to provide a structure and 
method for training a computer system to recognize specific actions of a user. 

The method may include displaying an image of a user within a v^ndow 
on a screen. The window may include a target area. The method may also 
include associating a first computer event with a first user action displayed in the 
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target area and storing information in a memory device such that the first user 
action is associated with the first computer event. 

A system is also provided that may associate specific user actions with 
specific computer commands. The system may include an image capture system 
5 that captures an image of a user. The system may also include an image display 
system that displays said image captured by the image capture system within a 
window on a display screen. The system may recognize the specific user actions 
and associate the specific user actions with the specific computer commands. 

Other objects, advantages and salient features of the invention will become 
10 apparent from the foUov^ng detailed description taken in conjunction with the 
annexed drawings, which disclose preferred embodiments of the invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 



The invention will be described in detail with reference to the following 
drawings in which like reference numerals refer to like elements and wherein: 
15 Fig. 1 shows the image capture system and display device according to the 

present invention; 

Figs. 2A and 2B show two different positions of the user and the target 
area within a window on a display according to the present invention; 

Fig. 3 is a flowchart showing a preferred method of the present invention; 
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Fig. 4 shows a hardware configuration of an information 
handUng/computer system for operating the present invention; and 

Fig. 5 shows a magnetic data storage diskette that may store a program 
according to the present invention. 




DETAILED DESCRIPTION OF PREFERRED 
EMBODIMENTS OF THE INVENTION 



/ As will be described below in grefater detail, the present system and 
method according to the present invention allows a user to define a target (or 
target region) in an image provided bWa camera (i.e., an image capture system). 
The user is allowed to describe or demonstrate what it means to activate the target 
and to define a specific computer fiipction to be associated with that activation 
(e.g., mouse click). Thereafter, whenever the system detects activation of the 
target, the computer system will perform the associated fiinction. The system may 
activate the target by: (1) detectiiifg a state, (2) detecting a motion or (3) detecting 
intersection between an object in the scene and an object. These ways of 
activating the target will be discussed below in greater detail. One skilled in the 
art would understand that these methods of activating the target are illustrative 
only and are not meant to be limiting. That is, other methods of activating the 
target are also within the scope of the present invention. 
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The present invention will now be described with reference to the 
accompanying figures. Fig. 1 shows a display device 10 having a display screen 
12. The display device 10 preferably is a conventional computer screen display 
and is connected to a conventional computer system as will be described below. 
An image capture system 20 may be positioned at any location such that it 
captures the image of the user, hi one embodiment, the display device 1 0 and 
image capture system 20 may be provided in public kiosks. 

As shown in Fig. 1, the display screen 12 includes a window 30 that 
displays a picture of the image that is captured by the image capture system 20. 
The window 30 is shown in the lower left-hand comer of the display screen 12 but 
it is not limited to this location. Rather, the window 30 may be displayed at any 
appropriate location on the screen 12. The computer system will include the 
appropriate hardware and software components that will connect the image 
capture system 20 to the computer system such that the image captured by the 
image capture system 20 is displayed within the window 30. The display screen 
12 will also show a normal computer screen and will provide computer prompts in 
a normal manner. 

The computer system preferably includes a training phase that occurs 
before the system begins operation. The system operates based on 
software/hardware that will be connected to the system. In the training phase, the 
image capture system 20 captures an image of the user and displays the "reversed" 
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image within the window 30 on the display screen 12. The user sees himself in 
much the same way as he would see himself in a mirror. In the training phase, the 
computer system is programmed to produce a visible target area 32 within the 
window 30 such as shown in Figs. 2A and 2B. The target area 32 is preferably 
movable about the window 30 using a pointing device or speech recognition 
system. The system is also capable of locating another target area 32 within 
another area of the window 30. The computer system will be trained to associate 
specific user movements/actions (or gestures) with respective computer functions. 
The user preferably demonstrates or describes the action. Demonstration occurs, 
for example, by the user hitting a key when the action is being performed, or just 
before and just after the action is performed. Features relating to the 
movements/actions will be stored in a memory device so the system will 
recognize the specific movements/actions after the training phase and associate 
the movements/actions with the respective computer function. 

In the training phase, the image capture system 20 initially displays an 
image of the user within the window 30. Features of this image may be stored in 
a memory device in order to help in recognizing the future movements/actions of 
the user. The target area 32 is displayed in the window 30 and may be positioned 
by the user or may be automatically positioned by the system. While the target 
area 32 is displayed within the window 30, one skilled in the art would understand 
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that the target area 32 corresponds to a specific area as seen by the image capture 
system 20. 

The target area 32 may be a rectangular shape, a square shape, a circle 
shape or any other type of shape. As discussed above, the user may locate or 
relocate the target area 32 about the window 30 using a pointing device or speech 
recognition system. The user may also manipulate the size (i.e., the dimensions) 
of the target area using respective input commands. The user may also define 
more than one target area, each with its associated function. 

The computer system preferably prompts the user to perform an action 
within the target area 32 that will be associated v/ith a specific computer fimction. 
For example, in Fig. 2A, the user is shown as tilting his head to the left within the 
window 30. By tilting the head to within the target area 32, the computer system 
recognizes the action within the target area 32. For example, the system may use 
background averaging, template or color predicate methods to recognize when 
part of a user's body is within the "target". Other methods will be described 
below. The training image(s) or features extracted from the training image(s), 
such as a summary, may then be stored in a memory device so that the computer 
system may recognize this fixture movement. While Fig. 2A shows movement of 
the head, this movement/action is not to be limiting as a type of movement/action 
(gesture) of a user. Rather, the user may move a hand, another part of the user's 
body or another object within the target area 32. 
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The movement/action will then be associated with a specific computer 
function. For example, the user may be asked to associate that specific 
movement/action within the target area 32 as corresponding to a specific computer 
function. One preferred computer function is that of a mouse click or the 
movement of a pointing device on the screen (i.e., movement of a cursor to the 
left or right). The movement may also correspond to other computer functions 
and/or letters/symbols. The computer system associates the inputted computer 
function (i.e., a mouse click) as corresponding to the movement/action shown 
within the target area 32. 

In a subsequent step of the training phase, another target area 32 may be 
placed in a new location as shown in Fig. 2B. In this figure, the user performs a 
second movement/action such as tilting the head to the right as viewed in the 
window 30. The altered image of the user within the target area 32 may be stored 
in a memory device. The computer system may again ask for (or automatically 
provide) a respective computer function that will correspond to this movement. 
The association is stored in the memory device for future use. 

The above example describes that the user performs a movement/action 
and then the user provides a corresponding computer function that will be 
associated with that movement. However, the computer system may 
automatically provide a respective computer function based on either a first, 
second or third movement or may merely automatically associate different 
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movements with a specific computer function. What is important is that the 
movement of the user is associated with a respective computer function either 
automatically or based on the user requested function. 

This training phase continues so that the computer system v^ll be trained 
to recognize different computer functions based on different movements/actions 
of the user. 

Once the user has appropriately trained the computer system to recognize 
his/her movements/actions (or gestures), the computer system may enter a normal 
operation phase while running a specific program/game/utility in which it will 
recognize the user's movements, such as a head tilt to the right or to the left and 
will associate that movement with the previously stored computer function. The 
computer system accomplishes this by capturing the image of the user using the 
image capture system 20. The image of the user is preferably displayed on the 
display screen 12 so the user will see his/her own movements. The target area 32 
does not need to be displayed in this mode. The user may then perform normal 
computer operations in a similar manner to a person using a mouse or other 
pointing device because the computer has been trained to recognize specific 
functions based on the movements of the user. 

Fig. 3 is a flowchart showing the inventive methodology of the training 
phase according to the present invention. In step SI 00, a user image is displayed 
on the screen 12 within the window 30. Subsequently, the target area 32 is 
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displayed in the window 30 in step SI 04. The user image is altered preferably by 
user movement in the target area 32 in step SI 06. Data from the training image(s) 
may be stored in step SI 08. The stored data is then associated with a specific 
computer function in step SllO. This association may be stored in memory in 
5 step SI 12. Other movements/actions may be associated with additional functions 
by producing another target area 32 in step SI 16 and repeating steps S106-S1 14 
for that new target area 32. It is understood that while Fig. 3 shows a preferred 
method of the present invention, the number of steps and the order of the steps 
need not be the same as shown in Fig. 3. 

10 Figure 4 illustrates a typical hardware configuration of an information 

handling/computer system for operating the present invention. Such a system 
preferably has at least one processor or central processing unit (CPU) 300. The 
CPU 300 may be intercormected via a system bus 301 to a random access memory 
(RAM) 302, read-only memory (ROM) 303, input/output (I/O) adapter 304 (for 

15 connecting peripheral devices such as disk units 305 and tape drives 306 to the 
bus 301), communication adapter 307 (for connecting an information handling 
system to a data processing network), user interface adapter 308 (for connecting a 
keyboard 309, microphone 310, mouse 311, speaker 312, image capture system 20 
and/or other user interface device to the bus 301), and display adapter 313 (for 

20 connecting the bus 301 to a display device 10). As is understood to one skilled in 
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the art, the program that executes the above described method according to the 
present invention may be stored on a floppy disk or be stored v^ithin the memory. 

Such a method as described above may be implemented, for example, by 
operating a computer, as embodied by a digital data processing apparatus, to 
execute a sequence of machine-readable instructions. These instructions may 
reside in various types of signal-bearing media. 

Thus, the present invention may be directed to a programmed product, 
including signal-bearing media tangibly embodying a program of machine- 
readable instructions executable by a digital data processor. 

This signal-bearing media may include, for example, a random access 
memory (RAM) such as for example a fast-access storage contained within the 
computer. Alternatively, the instructions may be contained in another signal- 
bearing media, such as a magnetic storage diskette 900 shown exemplarily in 
Figure 5, directly or indirectly accessible by the computer. 

Whether contained in the diskette, the computer, or elsewhere, the 
instructions may be stored on a variety of machine-readable data storage media, 
such as DASD storage (e.g., a conventional "hard drive" or a RAID array), 
magnetic tape, electronic read-only memory (e.g., ROM, EPROM, or EEPROM), 
an optical storage device (e.g., CD-ROM, WORM, DVD, digital optical tape, 
etc.), paper "punch" cards, or other suitable signal-bearing media including 
transmission media such as digital and analog and communication links and 



YOR.132 
Y0999-421 



11 



wireless. In an illustrative embodiment of the invention, the machine-readable 
instructions may comprise software object code, compiled from a suitable 
language. 

One alternative to this invention is the use of a touch screen and various 
onscreen cues or targets. This has the drawback that it is less durable than a 
camera (that can be located safely behind a small hole in the display bezel or 
kiosk facing), and can only be used at close range. 

As discussed below, the present invention has at least three different 
methods to activate the target. These are: (1) detecting a state, (2) detecting a 
motion, or (3) detecting intersection between an object in the scene and an object. 
These methods will now be discussed in greater detail. Other methods of 
activating the target are also within the scope of the present invention as these 
methods are merely illustrative. 

The first method to activate the target is detecting a pattem of color. 
During the training phase, the state of the target 32 may be demonstrated to the 
system. A summary of this color pattem (i.e., a color histogram) may be saved. 
The color pattem of each subsequent image during the normal operational phase 
may be matched to the one from the training phase, and if it is sufficiently close, 
then the target will be considered activated. Altematively, two states may be 
demonstrated during training (i.e., target empty and target full). The new pattems 
may be matched to each of those, and the system will decide which is closer. 
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Other types of states may also be statistical properties, such as average pixel 
intensity, shape properties such as number or pattern of lines, and possibly others. 

A second method of activating the target is detecting a motion. During 
training, the user may either describe or demonstrate a motion within the target 
32. This may be, for example, a left-to-right motion. The system may monitor 
the motion within the target 32 during subsequent images using standard 
techniques (e.g. optical flow, feature tracking, etc) and produce sunmiaries of that 
motion. When a target motion matches the originally trained motion, then the 
target will be considered activated. To demonstrate the motion, during training 
the user may prime the system by some standard technique such as a menu item. 
Then, the user may perform the motion (e.g., wave their hand through the target) 
and then tell the system they were done. The system monitors the motion within 
the target during that time interval, and produces and saves a summary of that 
motion. The system then monitors and matches the motion summary as described 
above. 

A third method of activating the target may be by intersection detection. 
During training, the user may identify an object in the scene. The system may 
then track that object in subsequent images. When that object intersects the 
target, the target is considered activated. Object tracking can be done using a 
variety of existing techniques. One class of tracking techniques involves 
identifying the object explicitly for the system. The system then extracts 
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information that uniquely identifies the object (i.e., color pattern, pixel template, 
etc). Subsequent images are searched for that object using color segmentation, 
template matching, etc. A second class involves showing the system a 
background image during training that does not contain the object. The system 
compares subsequent images to the background image to determine if the object is 
present, hi either case, the system may identify the boimdaries of the object being 
tracked and determines if it overlaps the target 32. 

As a further embodiment, the target region may be used for more than 
binary activation. That is, rather than being activated or not, the target may be 
determined to be in a number of states. Each of these states may have a different 
computer function associated v^ith it. For example, the motion (e.g. a hand in the 
target) within the target may be summarized as an average motion vector. As one 
example, the cursor on the screen may then be moved by a direction and an 
amount derived by passing that vector through an appropriate transfer function. 
Further, if a user's hand (or other type of activating object) remains within the 
target region beyond a predetermined time, then the cursor on the screen would 
continue to move in a left direction, for example. Other variations are also 
apparent to one skilled in the art. 

While the invention has been described with reference to specific 
embodiments, the description of the specific embodiments is illustrative only and 
is not to be considered as limiting the scope of the invention. Various other 
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modifications and changes may occur to those skilled in the art without departing 
fi-om the spirit and scope of the invention. 
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